source("custom_functions.R")Mucosal microbiota alterations in primary sclerosis cholangitis persist after liver transplantation and are associated with clinical features independently of geography
2. Hypothesis: PSC recurrence is associated with a specific composition of the gut microbiome
1 Introduction
1.1 About
rPSC vs non-rPSC vs Healthy analysis on merged data
Alpha diversity –> group effect, country effect, interaction effect
Beta diversity – PERMANOVA, PCA -> group effect, country effect, interaction effect
DAA ->
Group effect – linDA + MaAsLin2 intersection
Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries
Pairwise comparisons:
rPSC vs non-rPSC –by this comparison, we will find the effect of recurrence (whether the microbial composition is associated with it)
non-rPSC vs Healthy – by this comparison, we will find if the non-rPSC samples are „closer“ to the healthy samples then rPSC samples in terms of the microbial composition. In other words, does non-rPSC samples have „healthy microbiome“?
rPSC vs Healthy – by this comparison, we will find how much are the rPSC samples different from healthy samples.
However, the rPSC and non-rPSC samples are not different at all (see results). Therefore, we can do EXCLUSIVE LEFT JOIN of differentially abundant taxa from the analyses (rPSC vs Healthy, non-rPSC vs Healthy)
Importing libraries and custom functions built for this analysis
1.2 Data Import
Importing ASV, taxa and metadata tables for both Czech and Norway samples.
Czech
path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
check.names = FALSE))Norway
path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
check.names = FALSE))1.3 Merging data
Merging two countries to create whole dataset
asv_tab <- merge(asv_tab_ikem,asv_tab_norway,by="SeqID",all=TRUE)
taxa_tab <- merging_taxa_tables(taxa_tab_ikem,taxa_tab_norway)Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Merging two countries based on the different matrices - Ileum, Colon.
Terminal ileum
ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="TI",Q="Q2")Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 641 ASV(s)
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]Colon
colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="colon",Q="Q2")Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 1096 ASV(s)
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]2 Data Analysis - Terminal ileum
segment="terminal_ileum"2.1 Filtering
Rules:
sequencing depth > 10000
nearZeroVar() with default settings
Rarefaction Curve
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))load(file.path(path,"rarefaction_ileum.Rdata"))
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw() +
theme(axis.text=element_text(size=8), panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed",
color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(ileum_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.1 Sequencing depth
data_filt <- seq_depth_filtering(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
seq_depth_threshold = 10000)Removing 104 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata
seq_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata)
filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.3 Final Counts
final_counts_filtering(ileum_asv_tab,
filt_ileum_asv_tab,
filt_ileum_metadata,
seq_step, 0, nearzero_step) %>% `colnames<-`("Count") Count
Raw data: ASVs 4491
Raw data: Samples 220
Sequencing depth filt: ASVs 4387
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 575
Filt data: ASVs 575
Filt data: Samples 211
Filt data: Patients 211
Filt data: Patients.1 0
Filtered samples 9
Matrices TI
healthy 73
non-rPSC 105
rPSC 33
pre_ltx 0
post_ltx 0
ETOH 0
2.2 Alpha diversity
path = "../results/Q2/alpha_diversity"Calculation
# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
alpha_ileum_taxa_tab,
alpha_ileum_metadata))
ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_div_plots <- list()
# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
Observe=ileum_mpse$Observe,
Shannon=ileum_mpse$Shannon,
Simpson=ileum_mpse$Simpson,
Pielou=ileum_mpse$Pielou,
Group=ileum_mpse$Group,
Country=ileum_mpse$Country,
Patient=ileum_mpse$Patient)
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)2.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha2.2.2 Linear models
path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_emeans <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_emeans <- NA
}
# save the results
pc_observed <- list();
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -12.199 | 11.972 | -1.019 | 0.310 | 0.349 | |
| non-rPSC , rPSC - CZ vs NO | -21.486 | 11.452 | -1.876 | 0.063 | 0.141 | |
| non-rPSC vs GrouprPSC:CountryNO | 9.347 | 23.088 | 0.405 | 0.686 | 0.686 | |
| healthy vs GrouprPSC | -37.741 | 12.284 | -3.072 | 0.003 | 0.025 | * |
| healthy , rPSC - CZ vs NO | 11.203 | 10.973 | 1.021 | 0.310 | 0.349 | |
| healthy vs GrouprPSC:CountryNO | -23.342 | 21.355 | -1.093 | 0.277 | 0.349 | |
| healthy vs Groupnon-rPSC | -25.542 | 9.322 | -2.740 | 0.007 | 0.031 | * |
| healthy , non-rPSC - CZ vs NO | 11.203 | 10.933 | 1.025 | 0.307 | 0.349 | |
| healthy vs Groupnon-rPSC:CountryNO | -32.689 | 15.108 | -2.164 | 0.032 | 0.096 |
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| (non-rPSC) - healthy | CZ | -25.542 | 9.322 | 174 | -2.740 | 0.007 |
| (non-rPSC) - healthy | NO | -58.231 | 11.889 | 174 | -4.898 | 0.000 |
Shannon
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_emeans <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_emeans <- NA
}
# save the results
pc_shannon <- list();
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.102 | 0.146 | -0.696 | 0.487 | 0.731 | |
| non-rPSC , rPSC - CZ vs NO | -0.349 | 0.140 | -2.500 | 0.014 | 0.123 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.006 | 0.281 | -0.021 | 0.983 | 0.983 | |
| healthy vs GrouprPSC | -0.210 | 0.140 | -1.501 | 0.137 | 0.321 | |
| healthy , rPSC - CZ vs NO | 0.005 | 0.125 | 0.043 | 0.966 | 0.983 | |
| healthy vs GrouprPSC:CountryNO | -0.360 | 0.244 | -1.478 | 0.142 | 0.321 | |
| healthy vs Groupnon-rPSC | -0.109 | 0.115 | -0.948 | 0.344 | 0.620 | |
| healthy , non-rPSC - CZ vs NO | 0.005 | 0.135 | 0.040 | 0.968 | 0.983 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.354 | 0.186 | -1.905 | 0.058 | 0.263 |
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| (non-rPSC) - healthy | CZ | -0.109 | 0.115 | 174 | -0.948 | 0.344 |
| (non-rPSC) - healthy | NO | -0.463 | 0.146 | 174 | -3.164 | 0.002 |
Simpson
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_emeans <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_emeans <- NA
}
# save the results
pc_simpson <- list();
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | 0.000 | 0.023 | 0.019 | 0.985 | 0.985 | |
| non-rPSC , rPSC - CZ vs NO | -0.034 | 0.022 | -1.573 | 0.118 | 0.531 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.029 | 0.043 | -0.661 | 0.510 | 0.767 | |
| healthy vs GrouprPSC | -0.009 | 0.018 | -0.500 | 0.618 | 0.767 | |
| healthy , rPSC - CZ vs NO | -0.008 | 0.016 | -0.501 | 0.617 | 0.767 | |
| healthy vs GrouprPSC:CountryNO | -0.055 | 0.031 | -1.779 | 0.078 | 0.531 | |
| healthy vs Groupnon-rPSC | -0.009 | 0.016 | -0.564 | 0.574 | 0.767 | |
| healthy , non-rPSC - CZ vs NO | -0.008 | 0.019 | -0.410 | 0.682 | 0.767 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.026 | 0.027 | -0.975 | 0.331 | 0.767 |
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| rPSC - healthy | CZ | -0.009 | 0.018 | 102 | -0.500 | 0.618 |
| rPSC - healthy | NO | -0.064 | 0.025 | 102 | -2.526 | 0.013 |
Pielou
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_emeans <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_emeans <- NA
}
# save the results
pc_pielou <- list();
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.004 | 0.022 | -0.184 | 0.854 | 0.946 | |
| non-rPSC , rPSC - CZ vs NO | -0.047 | 0.021 | -2.260 | 0.025 | 0.229 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.005 | 0.042 | -0.108 | 0.914 | 0.946 | |
| healthy vs GrouprPSC | -0.001 | 0.021 | -0.067 | 0.946 | 0.946 | |
| healthy , rPSC - CZ vs NO | -0.009 | 0.018 | -0.463 | 0.645 | 0.946 | |
| healthy vs GrouprPSC:CountryNO | -0.043 | 0.036 | -1.195 | 0.235 | 0.705 | |
| healthy vs Groupnon-rPSC | 0.003 | 0.018 | 0.146 | 0.884 | 0.946 | |
| healthy , non-rPSC - CZ vs NO | -0.009 | 0.021 | -0.409 | 0.683 | 0.946 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.038 | 0.029 | -1.329 | 0.185 | 0.705 |
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")Table: Raw results of independent country analysis
2.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))2.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.
2.3.1 Main analysis
Genus level, Aitchison distance
level="genus"
path = "../results/Q2/beta_diversity"
pairwise_aitchison_raw <- list()
pca_plots_list <- list()Aggregation, filtering
# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level=level,
names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
ileum_metadata,
seq_depth_threshold=10000)Removing 2 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]2.3.1.0.1 PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 215.579 | 1.208 | 0.009 | 0.107 | 0.107 | |
| rPSC vs healthy | 1 | 561.696 | 3.345 | 0.030 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 760.242 | 4.431 | 0.024 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 601.929 | 3.372 | 0.024 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 496.841 | 2.959 | 0.027 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 624.474 | 3.640 | 0.020 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 156.176 | 0.874 | 0.006 | 0.788 | 0.788 | |
| rPSC vs healthy : Country | 1 | 164.743 | 0.981 | 0.009 | 0.464 | 0.696 | |
| non-rPSC vs healthy : Country | 1 | 209.954 | 1.225 | 0.007 | 0.089 | 0.267 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2)
print(result_list)
}
}2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p2.3.1.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))2.3.2 Supplementary analysis
supplements_beta <- list()2.3.2.1 Genus level
level="genus"2.3.2.1.1 Bray-Curtis
PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.251 | 1.208 | 0.008 | 0.241 | 0.241 | |
| rPSC vs healthy | 1 | 0.813 | 4.432 | 0.039 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1.241 | 6.538 | 0.034 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1.449 | 6.976 | 0.049 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 0.972 | 5.300 | 0.047 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1.336 | 7.038 | 0.037 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.151 | 0.724 | 0.005 | 0.803 | 0.803 | |
| rPSC vs healthy : Country | 1 | 0.287 | 1.572 | 0.014 | 0.067 | 0.100 | |
| non-rPSC vs healthy : Country | 1 | 0.354 | 1.873 | 0.010 | 0.011 | 0.033 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.5823 0.02622 3.0424 0.001 0.001
Residual 113 21.6268 0.97378
Total 114 22.2090 1.00000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.0126 0.08265 5.4958 0.001 0.001
Residual 61 11.2391 0.91735
Total 62 12.2517 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.0157 0.04597 4.9629 0.001 0.001
Residual 103 21.0797 0.95403
Total 104 22.0954 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.6741 0.0541 4.0607 0.001 0.001
Residual 71 11.7862 0.9459
Total 72 12.4603 1.0000
Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p2.3.2.1.2 Jaccard
PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.337 | 1.138 | 0.008 | 0.241 | 0.241 | |
| rPSC vs healthy | 1 | 0.887 | 3.269 | 0.030 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1.349 | 4.846 | 0.026 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1.460 | 4.927 | 0.035 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 1.092 | 4.023 | 0.036 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1.476 | 5.300 | 0.029 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.242 | 0.817 | 0.006 | 0.823 | 0.823 | |
| rPSC vs healthy : Country | 1 | 0.342 | 1.261 | 0.011 | 0.118 | 0.177 | |
| non-rPSC vs healthy : Country | 1 | 0.445 | 1.604 | 0.009 | 0.014 | 0.042 | * |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 0.699 0.02174 2.5112 0.001 0.001
Residual 113 31.468 0.97826
Total 114 32.167 1.00000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.0948 0.06116 3.9739 0.001 0.001
Residual 61 16.8056 0.93884
Total 62 17.9004 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.0948 0.03497 3.7326 0.001 0.001
Residual 103 30.2119 0.96503
Total 104 31.3067 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 0.8258 0.04372 3.2463 0.001 0.001
Residual 71 18.0612 0.95628
Total 72 18.8870 1.00000
Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p2.3.2.2 ASV level
level="ASV"2.3.2.2.1 Aitchison
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 316.296 | 1.176 | 0.008 | 0.071 | 0.071 | |
| rPSC vs healthy | 1 | 830.867 | 2.897 | 0.027 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1156.137 | 4.137 | 0.023 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 639.519 | 2.378 | 0.017 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 598.496 | 2.087 | 0.019 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 752.840 | 2.694 | 0.015 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 222.179 | 0.825 | 0.006 | 0.960 | 0.96 | |
| rPSC vs healthy : Country | 1 | 262.783 | 0.916 | 0.008 | 0.746 | 0.96 | |
| non-rPSC vs healthy : Country | 1 | 291.928 | 1.045 | 0.006 | 0.333 | 0.96 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2)
print(result_list)
}
}PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p
# see the results
p2.3.2.2.2 Bray-Curtis
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.505 | 1.557 | 0.011 | 0.026 | 0.026 | * |
| rPSC vs healthy | 1 | 1.342 | 4.364 | 0.039 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1.979 | 6.424 | 0.034 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1.444 | 4.451 | 0.032 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 1.059 | 3.444 | 0.031 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1.394 | 4.525 | 0.024 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.333 | 1.027 | 0.007 | 0.386 | 0.386 | |
| rPSC vs healthy : Country | 1 | 0.390 | 1.272 | 0.011 | 0.093 | 0.140 | |
| non-rPSC vs healthy : Country | 1 | 0.447 | 1.456 | 0.008 | 0.036 | 0.108 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}
}PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p2.3.2.2.3 Jaccard
PERMANOVA
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.518 | 1.318 | 0.009 | 0.029 | 0.029 | * |
| rPSC vs healthy | 1 | 1.117 | 2.934 | 0.027 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1.607 | 4.216 | 0.023 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1.214 | 3.089 | 0.022 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 0.939 | 2.467 | 0.023 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1.203 | 3.157 | 0.017 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.390 | 0.992 | 0.007 | 0.469 | 0.469 | |
| rPSC vs healthy : Country | 1 | 0.430 | 1.130 | 0.010 | 0.131 | 0.196 | |
| non-rPSC vs healthy : Country | 1 | 0.501 | 1.318 | 0.007 | 0.026 | 0.078 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}
}PCoA
p <- pca_plot_custom(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p2.3.2.3 Saving results
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
file = file.path(path,
paste0("supplements_beta_diversity_", segment,".xlsx")))2.4 Univariate Analysis
2.4.1 Main analysis
level="genus"
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list();
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()
# workbook for final df
wb <- createWorkbook()
# rPSC effect
rpsc_effect <- list()Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
ileum_genus_taxa_tab)2.4.1.1 rPSC vs non-rPSC
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.1.1 linDA
taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID
ileum_genus_tab_2 <- ileum_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
ileum_genus_taxa_tab_2 <- ileum_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab_2,
ileum_genus_taxa_tab_2,
ileum_metadata,
group, usage="linDA", filtering = FALSE)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 56 samples and 25 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.1.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.1.6 Prevalence testing
uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)
#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_ileum_uni_data <- ifelse(filt_ileum_uni_data>0,"YES","NO")
filt_ileum_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)
filt_ileum_uni_data <- merge(filt_ileum_uni_data %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")
p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_ileum_uni_data %>%
group_by(!!sym(taxon), Group) %>%
count() %>%
group_by(!!sym(taxon), Group) %>%
summarise(n = sum(n), .groups = "drop") %>%
pivot_wider(names_from = Group, values_from = n) %>%
column_to_rownames(taxon) %>%
as.matrix()
if (nrow(data)>1){
data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)
test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)
}Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]named numeric(0)
asv_tab <- as.data.frame(rrarefy(
linda_data[[1]] %>% t(),
sample=10000)) %>% t() %>% as.data.frame()
asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()
asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")
p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
group_by(!!sym(taxon), Group) %>%
count() %>%
group_by(!!sym(taxon), Group) %>%
summarise(n = sum(n), .groups = "drop") %>%
pivot_wider(names_from = Group, values_from = n) %>%
column_to_rownames(taxon) %>%
as.matrix()
if (nrow(data)>1){
data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)
}Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]named numeric(0)
df <- data.frame(SeqID=uni_df$SeqID,
p=p_values,
p_adj=p_values_adjusted)prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))2.4.1.2 rPSC vs healthy
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 67 ASV(s)
Removing 4 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 106 samples and 207 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.2.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.3 non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 24 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 178 samples and 179 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.3.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(
list_heatmap, uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
ileum_taxa_tab) + xlab("") + ylab("")min_clr -1.727234
max_clr 7.029952
min_log -4.215042
max_log 3.331439
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
pdot_heatmap_ileum <- p2.4.1.5 rPSC effect
pre_LTx vs Healthy and Post_LTx vs Healthy intersection
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]
rpsc_effect[[paste(segment,level)]] <- df
# see the results
rpsc_effect[[paste(segment,level)]] SeqID
6 Collinsella
8 Adlercreutzia
15 Bacteroides
16 Barnesiella
29 Parabacteroides
32 Bilophila
41 Holdemania
75 Lachnospiraceae_ND3007_group
76 Lachnospiraceae_NK4A136_group
84 Sellimonas
99 Butyricicoccus
134 Family_XIII_AD3011_group
155 Pseudomonas
169 Lacticaseibacillus
188 Veillonellaceae
199 Klebsiella
Taxonomy
6 k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella
8 k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Adlercreutzia
15 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides
16 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
29 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
32 k__Bacteria;p__Desulfobacterota;c__Desulfovibrionia;o__Desulfovibrionales;f__Desulfovibrionaceae;g__Bilophila
41 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
75 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_ND3007_group
76 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_NK4A136_group
84 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Sellimonas
99 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus
134 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
155 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
169 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lacticaseibacillus
188 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonellaceae
199 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
log2FoldChange p_value padj
6 -2.349468 1.092155e-02 0.0443286425
8 -1.638315 7.387504e-03 0.0332437679
15 -2.119974 3.932211e-04 0.0040698386
16 -4.215042 1.576685e-05 0.0004079671
29 -3.049283 2.871549e-05 0.0005403732
32 -2.837220 4.305534e-05 0.0007427046
41 -1.686545 1.351704e-03 0.0099929559
75 -2.216009 4.785012e-03 0.0260657213
76 -2.596492 1.644208e-04 0.0020100837
84 1.773614 1.044921e-02 0.0441425798
99 -1.751985 3.166657e-03 0.0204843095
134 -2.011614 1.298386e-04 0.0019186055
155 1.025212 8.916234e-03 0.0384512596
169 1.281555 1.747899e-04 0.0020100837
188 1.244317 1.665793e-04 0.0020100837
199 2.461632 4.570063e-03 0.0260223513
2.4.1.6 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)
# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("significant_taxa_",segment,".xlsx")))2.5 Machine learning
path = "../results/Q2/models"2.5.1 ElasticNet
model="enet"2.5.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]2.5.1.1.1 rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ <- list()
models_cm <- list()
roc_cs <- list()
betas <- list()
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.4000000
lambda 0.3119645
auc 0.6432900
auc_czech 0.6567842
auc_no 0.6069959
auc_optimism_corrected 0.5736934
auc_optimism_corrected_CIL 0.3905857
auc_optimism_corrected_CIU 0.7322617
accuracy 0.7608696
accuracy_czech NaN
accuracy_no 0.7500000
accuracy_optimism_corrected 0.7353858
accuracy_optimism_corrected_CIL 0.6288683
accuracy_optimism_corrected_CIU 0.8277572
enet_model$conf_matrices$original
0
0 105 0
1 33 0
$czech
0
0 78 0
1 24 0
$no
0
0 27 0
1 9 0
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.1.2 rPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
taxa_to_test,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.5017248
auc 0.5160173
auc_czech 0.5224359
auc_no 0.5000000
auc_optimism_corrected 0.5599297
auc_optimism_corrected_CIL 0.3863553
auc_optimism_corrected_CIU 0.7232201
accuracy 0.7608696
accuracy_czech NaN
accuracy_no 0.7500000
accuracy_optimism_corrected 0.7539422
accuracy_optimism_corrected_CIL 0.6525954
accuracy_optimism_corrected_CIU 0.8442308
enet_model$conf_matrices$original
0
0 105 0
1 33 0
$czech
0
0 78 0
1 24 0
$no
0
0 27 0
1 9 0
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.2 Saving results
models_summ_df_ileum <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))2.5.2 Supplementary models
supplements_models <- list()2.5.2.1 CLR-transformed data
2.5.2.1.1 kNN
model="knn"2.5.2.1.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 30.0000000
auc 0.6545455
auc_optimism_corrected 0.5178066
auc_optimism_corrected_CIL 0.3207460
auc_optimism_corrected_CIU 0.6850148
accuracy 0.7608696
accuracy_optimism_corrected 0.7569849
accuracy_optimism_corrected_CIL 0.6597766
accuracy_optimism_corrected_CIU 0.8489674
roc_crPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
taxa_to_test,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 30.0000000
auc 0.7000000
auc_optimism_corrected 0.5814282
auc_optimism_corrected_CIL 0.4123932
auc_optimism_corrected_CIU 0.7384905
accuracy 0.7608696
accuracy_optimism_corrected 0.7493972
accuracy_optimism_corrected_CIL 0.6412554
accuracy_optimism_corrected_CIU 0.8403215
roc_c2.5.2.1.2 Random Forest
model="rf"2.5.2.1.2.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "137"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.5757215"
auc_optimism_corrected_CIL "0.3834014"
auc_optimism_corrected_CIU "0.7556583"
accuracy "1"
accuracy_optimism_corrected "0.7398359"
accuracy_optimism_corrected_CIL "0.640717"
accuracy_optimism_corrected_CIU "0.838449"
roc_crPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
taxa_to_test,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "19"
splitrule "gini"
min.node.size "5"
auc "1"
auc_optimism_corrected "0.6342782"
auc_optimism_corrected_CIL "0.4720432"
auc_optimism_corrected_CIU "0.7809896"
accuracy "1"
accuracy_optimism_corrected "0.7525932"
accuracy_optimism_corrected_CIL "0.6464154"
accuracy_optimism_corrected_CIU "0.8541667"
roc_c2.5.2.1.3 Gradient boosting
model="gb"2.5.2.1.3.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 200.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.5479922
auc_optimism_corrected_CIL 0.3907143
auc_optimism_corrected_CIU 0.6989388
accuracy 1.0000000
accuracy_optimism_corrected 0.7187818
accuracy_optimism_corrected_CIL 0.6200000
accuracy_optimism_corrected_CIU 0.8222222
roc_crPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
taxa_to_test,
usage="ml_clr")Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 100.0000000
interaction.depth 1.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 0.9624820
auc_optimism_corrected 0.6125697
auc_optimism_corrected_CIL 0.4346418
auc_optimism_corrected_CIU 0.7637106
accuracy 0.8985507
accuracy_optimism_corrected 0.7516319
accuracy_optimism_corrected_CIL 0.6601792
accuracy_optimism_corrected_CIU 0.8453419
roc_c2.5.2.2 Saving results
models_list <- list()
for (model_name in names(supplements_models$models_summ)){
df <- do.call(rbind, supplements_models$models_summ[[model_name]])
models_list[[model_name]] <- df
}
write.xlsx(models_list,
file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
rowNames=TRUE)2.6 Results overview
2.6.0.1 Alpha diversity
Models
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -12.199 | 11.972 | -1.019 | 0.310 | 0.349 | |
| non-rPSC , rPSC - CZ vs NO | -21.486 | 11.452 | -1.876 | 0.063 | 0.141 | |
| non-rPSC vs GrouprPSC:CountryNO | 9.347 | 23.088 | 0.405 | 0.686 | 0.686 | |
| healthy vs GrouprPSC | -37.741 | 12.284 | -3.072 | 0.003 | 0.025 | * |
| healthy , rPSC - CZ vs NO | 11.203 | 10.973 | 1.021 | 0.310 | 0.349 | |
| healthy vs GrouprPSC:CountryNO | -23.342 | 21.355 | -1.093 | 0.277 | 0.349 | |
| healthy vs Groupnon-rPSC | -25.542 | 9.322 | -2.740 | 0.007 | 0.031 | * |
| healthy , non-rPSC - CZ vs NO | 11.203 | 10.933 | 1.025 | 0.307 | 0.349 | |
| healthy vs Groupnon-rPSC:CountryNO | -32.689 | 15.108 | -2.164 | 0.032 | 0.096 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.102 | 0.146 | -0.696 | 0.487 | 0.731 | |
| non-rPSC , rPSC - CZ vs NO | -0.349 | 0.140 | -2.500 | 0.014 | 0.123 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.006 | 0.281 | -0.021 | 0.983 | 0.983 | |
| healthy vs GrouprPSC | -0.210 | 0.140 | -1.501 | 0.137 | 0.321 | |
| healthy , rPSC - CZ vs NO | 0.005 | 0.125 | 0.043 | 0.966 | 0.983 | |
| healthy vs GrouprPSC:CountryNO | -0.360 | 0.244 | -1.478 | 0.142 | 0.321 | |
| healthy vs Groupnon-rPSC | -0.109 | 0.115 | -0.948 | 0.344 | 0.620 | |
| healthy , non-rPSC - CZ vs NO | 0.005 | 0.135 | 0.040 | 0.968 | 0.983 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.354 | 0.186 | -1.905 | 0.058 | 0.263 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | 0.000 | 0.023 | 0.019 | 0.985 | 0.985 | |
| non-rPSC , rPSC - CZ vs NO | -0.034 | 0.022 | -1.573 | 0.118 | 0.531 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.029 | 0.043 | -0.661 | 0.510 | 0.767 | |
| healthy vs GrouprPSC | -0.009 | 0.018 | -0.500 | 0.618 | 0.767 | |
| healthy , rPSC - CZ vs NO | -0.008 | 0.016 | -0.501 | 0.617 | 0.767 | |
| healthy vs GrouprPSC:CountryNO | -0.055 | 0.031 | -1.779 | 0.078 | 0.531 | |
| healthy vs Groupnon-rPSC | -0.009 | 0.016 | -0.564 | 0.574 | 0.767 | |
| healthy , non-rPSC - CZ vs NO | -0.008 | 0.019 | -0.410 | 0.682 | 0.767 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.026 | 0.027 | -0.975 | 0.331 | 0.767 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.004 | 0.022 | -0.184 | 0.854 | 0.946 | |
| non-rPSC , rPSC - CZ vs NO | -0.047 | 0.021 | -2.260 | 0.025 | 0.229 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.005 | 0.042 | -0.108 | 0.914 | 0.946 | |
| healthy vs GrouprPSC | -0.001 | 0.021 | -0.067 | 0.946 | 0.946 | |
| healthy , rPSC - CZ vs NO | -0.009 | 0.018 | -0.463 | 0.645 | 0.946 | |
| healthy vs GrouprPSC:CountryNO | -0.043 | 0.036 | -1.195 | 0.235 | 0.705 | |
| healthy vs Groupnon-rPSC | 0.003 | 0.018 | 0.146 | 0.884 | 0.946 | |
| healthy , non-rPSC - CZ vs NO | -0.009 | 0.021 | -0.409 | 0.683 | 0.946 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.038 | 0.029 | -1.329 | 0.185 | 0.705 |
Plots
alpha_div_plots[[paste(segment,"Country")]]2.6.0.2 Beta diversity
Main results
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 215.579 | 1.208 | 0.009 | 0.107 | 0.107 | |
| rPSC vs healthy | 1 | 561.696 | 3.345 | 0.030 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 760.242 | 4.431 | 0.024 | 0.001 | 0.002 | ** |
| rPSC vs non-rPSC , Country | 1 | 601.929 | 3.372 | 0.024 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 496.841 | 2.959 | 0.027 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 624.474 | 3.640 | 0.020 | 0.001 | 0.001 | *** |
| rPSC vs non-rPSC : Country | 1 | 156.176 | 0.874 | 0.006 | 0.788 | 0.788 | |
| rPSC vs healthy : Country | 1 | 164.743 | 0.981 | 0.009 | 0.464 | 0.696 | |
| non-rPSC vs healthy : Country | 1 | 209.954 | 1.225 | 0.007 | 0.089 | 0.267 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]Supplements
knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta)) & (grepl("genus",names(supplements_beta)))],
digits = 3,
caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
|
|
Supplements PCA
ggarrange(plotlist = supplements_beta[grepl("PCoA",names(supplements_beta))],
labels=names(supplements_beta[grepl("PCoA",names(supplements_beta))]),
font.label = list(size=5,face="plain"),
ncol=2,nrow=3)2.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>%
`colnames<-`("Count") %>%
`rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")| Count | |
|---|---|
| terminal_ileum genus non-rPSC vs rPSC | 0 |
| terminal_ileum genus healthy vs rPSC | 34 |
| terminal_ileum genus healthy vs non-rPSC | 25 |
| PSC effect Genus | 16 |
2.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=2,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| rPSC vs non-rPSC genus terminal_ileum | 0.4 | 0.31 | 0.57 | 0.39 | 0.73 |
| rPSC vs non-rPSC subset genus terminal_ileum | 0.2 | 0.50 | 0.56 | 0.39 | 0.72 |
ROC - Genus level
roc_curve_all_custom(roc_cs[c(1:2)],Q="Q2",
model_name="enet_model",legend = FALSE)Supplementary models
# Build final dataframe
models_list[["enet_model"]] <- models_summ_df_ileum
final_df <- tibble(row_names = rownames(models_list[[1]]))
# Loop through models and extract required values
for (model_name in names(models_list)) {
model_df <- models_list[[model_name]]
# Combine AUC_optimism_corrected with its CI values
final_df[[model_name]] <- paste0(
round(model_df$auc_optimism_corrected, 3),
" (", round(model_df$auc_optimism_corrected_CIL, 3), "; ",
round(model_df$auc_optimism_corrected_CIU, 3), ")"
)
}
knitr::kable(final_df, caption="All models")| row_names | knn_model | rf_model | gbm_model | enet_model |
|---|---|---|---|---|
| rPSC vs non-rPSC genus terminal_ileum | 0.518 (0.321; 0.685) | 0.576 (0.383; 0.756) | 0.548 (0.391; 0.699) | 0.574 (0.391; 0.732) |
| rPSC vs non-rPSC subset genus terminal_ileum | 0.581 (0.412; 0.738) | 0.634 (0.472; 0.781) | 0.613 (0.435; 0.764) | 0.56 (0.386; 0.723) |
Supplementary ROC - genus
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()
for (model_name in names(rocs_list)) {
plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(1:2)],
Q="Q2",
model_name=model_name,legend = FALSE)
}
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
p3 Data Analysis - Colon
segment="colon"3.1 Filtering
Rules: - sequencing depth > 10000
- nearZeroVar() with default settings
Rarefaction Curve
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw()+
theme(axis.text=element_text(size=8),
panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed", color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(colon_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.1 Sequencing depth
data_filt <- seq_depth_filtering(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
seq_depth_threshold = 10000)Removing 50 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata
seq_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Check zero depth
data_filt <- check_zero_depth(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]];
filt_colon_taxa_tab <- data_filt[[2]];
filt_colon_metadata <- data_filt[[3]]; Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.3 Final Counts
final_counts_filtering(colon_asv_tab,
filt_colon_asv_tab,
filt_colon_metadata,
seq_step, 0, nearzero_step) V1
Raw data: ASVs 5840
Raw data: Samples 527
Sequencing depth filt: ASVs 5790
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 444
Filt data: ASVs 444
Filt data: Samples 511
Filt data: Patients 250
Filt data: Patients.1 0
Filtered samples 16
Matrices Cecum;Rectum;CD;CA;SI
healthy 161
non-rPSC 263
rPSC 87
pre_ltx 0
post_ltx 0
ETOH 0
3.2 Alpha diversity
path = "../results/Q2/alpha_diversity"Calculation
# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
alpha_colon_taxa_tab,
alpha_colon_metadata))
colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
Observe=colon_mpse$Observe,
Shannon=colon_mpse$Shannon,
Simpson=colon_mpse$Simpson,
Pielou=colon_mpse$Pielou,
Group=colon_mpse$Group,
Country=colon_mpse$Country,
Patient=colon_mpse$Patient)
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)3.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha3.2.2 Linear Models
path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
results_model <- pairwise.lmer(
formula = "Observe ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_detailed <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_detailed <- NA
}
# save the results
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -17.252 | 12.469 | 156.883 | -1.384 | 0.168 | 0.192 | |
| non-rPSC vs rPSC - CZ vs NO | -17.941 | 10.644 | 153.303 | -1.686 | 0.094 | 0.192 | |
| non-rPSC vs GrouprPSC:CountryNO | 3.483 | 21.371 | 152.108 | 0.163 | 0.871 | 0.871 | |
| healthy vs GrouprPSC | -40.307 | 11.924 | 125.516 | -3.380 | 0.001 | 0.009 | ** |
| healthy vs rPSC - CZ vs NO | 14.783 | 10.544 | 126.613 | 1.402 | 0.163 | 0.192 | |
| healthy vs GrouprPSC:CountryNO | -29.232 | 19.769 | 123.028 | -1.479 | 0.142 | 0.192 | |
| healthy vs Groupnon-rPSC | -22.976 | 8.905 | 211.354 | -2.580 | 0.011 | 0.047 | * |
| healthy vs non-rPSC - CZ vs NO | 14.961 | 10.892 | 212.956 | 1.374 | 0.171 | 0.192 | |
| healthy vs Groupnon-rPSC:CountryNO | -32.911 | 14.701 | 208.353 | -2.239 | 0.026 | 0.079 |
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
Shannon
results_model <- pairwise.lmer(
formula = "Shannon ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_detailed <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_detailed <- NA
}
# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.146 | 0.161 | 154.485 | -0.907 | 0.366 | 0.549 | |
| non-rPSC vs rPSC - CZ vs NO | -0.355 | 0.138 | 151.788 | -2.581 | 0.011 | 0.051 | |
| non-rPSC vs GrouprPSC:CountryNO | 0.008 | 0.277 | 150.856 | 0.030 | 0.976 | 0.976 | |
| healthy vs GrouprPSC | -0.330 | 0.128 | 126.201 | -2.572 | 0.011 | 0.051 | |
| healthy vs rPSC - CZ vs NO | 0.017 | 0.113 | 127.217 | 0.146 | 0.884 | 0.976 | |
| healthy vs GrouprPSC:CountryNO | -0.364 | 0.213 | 123.907 | -1.709 | 0.090 | 0.162 | |
| healthy vs Groupnon-rPSC | -0.182 | 0.101 | 207.051 | -1.795 | 0.074 | 0.162 | |
| healthy vs non-rPSC - CZ vs NO | 0.018 | 0.124 | 208.617 | 0.145 | 0.884 | 0.976 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.375 | 0.168 | 204.129 | -2.240 | 0.026 | 0.078 |
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
|
Simpson
results_model <- pairwise.lmer(
formula = "Simpson ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_detailed <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_detailed <- NA
}
# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.005 | 0.029 | 153.608 | -0.169 | 0.866 | 0.866 | |
| non-rPSC vs rPSC - CZ vs NO | -0.032 | 0.024 | 151.251 | -1.296 | 0.197 | 0.443 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.027 | 0.049 | 150.427 | -0.554 | 0.581 | 0.806 | |
| healthy vs GrouprPSC | -0.031 | 0.020 | 127.660 | -1.523 | 0.130 | 0.391 | |
| healthy vs rPSC - CZ vs NO | -0.007 | 0.018 | 128.411 | -0.402 | 0.688 | 0.806 | |
| healthy vs GrouprPSC:CountryNO | -0.052 | 0.034 | 125.982 | -1.540 | 0.126 | 0.391 | |
| healthy vs Groupnon-rPSC | -0.026 | 0.016 | 201.924 | -1.600 | 0.111 | 0.391 | |
| healthy vs non-rPSC - CZ vs NO | -0.007 | 0.020 | 203.469 | -0.363 | 0.717 | 0.806 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.025 | 0.026 | 199.046 | -0.952 | 0.342 | 0.616 |
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")Table: Raw results of independent country analysis
Pielou
results_model <- pairwise.lmer(
formula = "Pielou ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_detailed <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_detailed <- NA
}
# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.007 | 0.026 | 153.863 | -0.292 | 0.771 | 0.843 | |
| non-rPSC vs rPSC - CZ vs NO | -0.052 | 0.022 | 150.735 | -2.372 | 0.019 | 0.171 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.009 | 0.044 | 149.674 | -0.198 | 0.843 | 0.843 | |
| healthy vs GrouprPSC | -0.023 | 0.020 | 127.665 | -1.183 | 0.239 | 0.538 | |
| healthy vs rPSC - CZ vs NO | -0.008 | 0.017 | 129.013 | -0.457 | 0.648 | 0.843 | |
| healthy vs GrouprPSC:CountryNO | -0.052 | 0.033 | 124.565 | -1.605 | 0.111 | 0.333 | |
| healthy vs Groupnon-rPSC | -0.015 | 0.015 | 204.651 | -1.006 | 0.316 | 0.568 | |
| healthy vs non-rPSC - CZ vs NO | -0.008 | 0.019 | 206.473 | -0.416 | 0.678 | 0.843 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.044 | 0.025 | 201.171 | -1.745 | 0.083 | 0.333 |
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
3.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))3.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.
3.3.1 Main analysis
Genus level, Aitchison distance
level="genus"
# needed path
path = "../results/Q2/beta_diversity"Aggregation, filtering
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level=level,
names=TRUE)
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
colon_metadata,
seq_depth_threshold=10000)Removing 5 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]3.3.1.0.1 PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
interaction = TRUE,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 419.973 | 2.475 | 0.007 | 0.415 | 0.415 | |
| rPSC vs healthy | 1 | 1473.966 | 8.988 | 0.035 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1742.331 | 10.401 | 0.024 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1556.933 | 9.177 | 0.026 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 883.284 | 5.386 | 0.021 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1558.672 | 9.304 | 0.021 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 318.175 | 1.880 | 0.005 | 0.970 | 0.970 | |
| rPSC vs healthy : Country | 1 | 319.594 | 1.956 | 0.008 | 0.820 | 0.970 | |
| non-rPSC vs healthy : Country | 1 | 407.191 | 2.439 | 0.006 | 0.163 | 0.489 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_genus_metadata$Patient)
print(result_list)
}
}3.3.1.0.2 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
show_boxplots = TRUE,
variable = "Group", size=2,
show_legend=FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p3.3.1.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))3.3.2 Supplementary analysis
3.3.2.1 Genus level
level="genus"3.3.2.1.1 Bray-Curtis
PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.504 | 2.488 | 0.007 | 0.267 | 0.267 | |
| rPSC vs healthy | 1 | 2.464 | 13.675 | 0.051 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 3.227 | 18.295 | 0.039 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 5.037 | 24.881 | 0.066 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 2.136 | 11.852 | 0.044 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 4.670 | 26.476 | 0.057 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.446 | 2.209 | 0.006 | 0.460 | 0.460 | |
| rPSC vs healthy : Country | 1 | 0.495 | 2.765 | 0.010 | 0.102 | 0.153 | |
| non-rPSC vs healthy : Country | 1 | 1.025 | 5.881 | 0.012 | 0.001 | 0.003 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'bray')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.415 0.03183 8.3495 0.001 0.001
Residual 254 43.052 0.96817
Total 255 44.467 1.00000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.837 0.08594 15.607 0.001 0.001
Residual 166 30.175 0.91406
Total 167 33.012 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 4.273 0.0792 22.449 0.001 0.001
Residual 261 49.683 0.9208
Total 262 53.956 1.0000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.4216 0.05694 9.6009 0.001 0.001
Residual 159 23.5437 0.94306
Total 160 24.9653 1.00000
Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p3.3.2.1.2 Jaccard
PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.590 | 2.035 | 0.006 | 0.594 | 0.594 | |
| rPSC vs healthy | 1 | 2.438 | 9.075 | 0.035 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 3.380 | 12.658 | 0.028 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 4.994 | 17.225 | 0.047 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 2.338 | 8.704 | 0.033 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 4.864 | 18.215 | 0.040 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.589 | 2.039 | 0.006 | 0.696 | 0.696 | |
| rPSC vs healthy : Country | 1 | 0.632 | 2.365 | 0.009 | 0.208 | 0.312 | |
| non-rPSC vs healthy : Country | 1 | 1.174 | 4.433 | 0.010 | 0.001 | 0.003 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.591 0.02354 6.1232 0.001 0.001
Residual 254 65.982 0.97646
Total 255 67.573 1.00000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.963 0.06145 10.869 0.001 0.001
Residual 166 45.256 0.93855
Total 167 48.219 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 4.325 0.05591 15.458 0.001 0.001
Residual 261 73.032 0.94409
Total 262 77.358 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.712 0.0429 7.1261 0.001 0.001
Residual 159 38.206 0.9571
Total 160 39.918 1.0000
Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p3.3.2.2 ASV level
level="ASV"3.3.2.2.1 Aitchison
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(x=pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
sim.method = "robust.aitchison",
p.adjust.m="BH",
patients = filt_colon_metadata$Patient)
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
interaction = TRUE,
sim.method = "robust.aitchison",
p.adjust.m="BH",
patients = filt_colon_metadata$Patient)
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)
# see the results
pp_factor pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
1 rPSC vs non-rPSC 1 470.424 1.892872 0.005323748 0.968 0.9680
2 rPSC vs healthy 1 1842.677 6.978727 0.027237749 0.001 0.0015 **
3 non-rPSC vs healthy 1 2475.754 9.499859 0.021710892 0.001 0.0015 **
pp_cov pairs Df SumsOfSqs F.Model R2 p.value
1 rPSC vs non-rPSC , Country 1 1644.116 6.615527 0.01860632 0.001
2 rPSC vs healthy , Country 1 1119.321 4.239175 0.01654537 0.001
3 non-rPSC vs healthy , Country 1 1833.401 7.035050 0.01607784 0.001
p.adjusted sig
1 0.001 ***
2 0.001 ***
3 0.001 ***
pp_fac.cov pairs Df SumsOfSqs F.Model R2 p.value
1 rPSC vs non-rPSC : Country 1 470.8808 1.899623 0.005328918 0.989
2 rPSC vs healthy : Country 1 517.8884 1.969144 0.007655226 0.896
3 non-rPSC vs healthy : Country 1 611.7986 2.355124 0.005365111 0.163
p.adjusted sig
1 0.989
2 0.989
3 0.489
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 470.424 | 1.893 | 0.005 | 0.968 | 0.968 | |
| rPSC vs healthy | 1 | 1842.677 | 6.979 | 0.027 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 2475.754 | 9.500 | 0.022 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 1644.116 | 6.616 | 0.019 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 1119.321 | 4.239 | 0.017 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1833.401 | 7.035 | 0.016 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 470.881 | 1.900 | 0.005 | 0.989 | 0.989 | |
| rPSC vs healthy : Country | 1 | 517.888 | 1.969 | 0.008 | 0.896 | 0.989 | |
| non-rPSC vs healthy : Country | 1 | 611.799 | 2.355 | 0.005 | 0.163 | 0.489 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_metadata$Patient)
print(result_list)
}
}PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
show_boxplots = TRUE,
variable = "Group",
size=3,
show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p
# see the results
p3.3.2.2.2 Bray-Curtis
PERMANOVA
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
sim.method = "bray", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
interaction = TRUE, sim.method = "bray", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.737 | 2.327 | 0.006 | 0.415 | 0.415 | |
| rPSC vs healthy | 1 | 3.308 | 10.870 | 0.041 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 4.794 | 16.082 | 0.036 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 4.600 | 14.526 | 0.040 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 2.184 | 7.177 | 0.027 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 4.330 | 14.527 | 0.032 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.792 | 2.512 | 0.007 | 0.383 | 0.383 | |
| rPSC vs healthy : Country | 1 | 0.815 | 2.697 | 0.010 | 0.145 | 0.217 | |
| non-rPSC vs healthy : Country | 1 | 1.360 | 4.601 | 0.010 | 0.002 | 0.006 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_metadata$Patient,
sim.method = 'bray')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 2.255 0.029 7.5854 0.001 0.001
Residual 254 75.510 0.971
Total 255 77.765 1.000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 3.899 0.07422 13.309 0.001 0.001
Residual 166 48.631 0.92578
Total 167 52.530 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 4.042 0.04823 13.227 0.001 0.001
Residual 261 79.749 0.95177
Total 262 83.790 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.649 0.03581 5.9053 0.001 0.001
Residual 159 44.393 0.96419
Total 160 46.041 1.00000
PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
measure = "bray",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p
# see the results
p3.3.2.2.3 Jaccard
PERMANOVA
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
sim.method = "jaccard", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor,
pp_cov,
pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 0.737 | 1.909 | 0.005 | 0.873 | 0.873 | |
| rPSC vs healthy | 1 | 2.563 | 6.794 | 0.026 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 3.757 | 10.047 | 0.023 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC , Country | 1 | 3.698 | 9.578 | 0.027 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 1.887 | 5.001 | 0.019 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 3.508 | 9.381 | 0.021 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC : Country | 1 | 0.814 | 2.115 | 0.006 | 0.822 | 0.822 | |
| rPSC vs healthy : Country | 1 | 0.853 | 2.271 | 0.009 | 0.388 | 0.582 | |
| non-rPSC vs healthy : Country | 1 | 1.296 | 3.486 | 0.008 | 0.002 | 0.006 | ** |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_metadata$Group,
covariate = filt_colon_metadata$Country,
patients = filt_colon_metadata$Patient,
group1 = group1,
group2 = group2,
sim.method = 'jaccard')
print(result_list)
}$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 1.890 0.01954 5.0623 0.001 0.001
Residual 254 94.839 0.98046
Total 255 96.729 1.00000
$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Fac 1 3.163 0.04907 8.5656 0.001 0.001
Residual 166 61.293 0.95093
Total 167 64.456 1.00000
$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 3.288 0.03219 8.6816 0.001 0.001
Residual 261 98.853 0.96781
Total 262 102.141 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Cov 1 1.516 0.02578 4.2075 0.001 0.001
Residual 159 57.279 0.97422
Total 160 58.794 1.00000
PCoA
p <- pca_plot_custom(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata,
measure = "jaccard",
show_boxplots = TRUE,
variable = "Group", size=3, show_legend=FALSE)
# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p
# see the results
p3.3.2.3 Saving results
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
file = file.path(path,
paste0("supplements_beta_diversity_", segment,".xlsx")))3.4 Univariate Analysis
3.4.1 Main - Genus level
level="genus"
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_venns <- list()
# workbook for final df
wb <- createWorkbook()
# PSC effect
rpsc_effect <- list()3.4.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
colon_genus_taxa_tab)3.4.1.1.1 rPSC vs non-rPSC
3.4.1.1.1.1 linDA
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID
colon_genus_tab_2 <- colon_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
colon_genus_taxa_tab_2 <- colon_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
# prepare the data
linda_data <- binomial_prep(colon_genus_tab_2,
colon_genus_taxa_tab_2,
colon_metadata,
group, usage="linDA",filtering = FALSE)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 231 samples and 33 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.2 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.3 Prevalence testing
uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)
#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_colon_uni_data <- ifelse(filt_colon_uni_data>0,"YES","NO")
filt_colon_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)
filt_colon_uni_data <- merge(filt_colon_uni_data %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE)
p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_colon_uni_data %>%
group_by(!!sym(taxon), Group) %>%
count() %>%
group_by(!!sym(taxon), Group) %>%
summarise(n = sum(n), .groups = "drop") %>%
pivot_wider(names_from = Group, values_from = n) %>%
column_to_rownames(taxon) %>%
as.matrix()
data
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)
test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
}Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)] Bilophila
0.01992028
asv_tab <- as.data.frame(rrarefy(
linda_data[[1]] %>% t(),
sample=10000)) %>% t() %>% as.data.frame()
asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()
set.seed(123)
asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE)
p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
group_by(!!sym(taxon), Group) %>%
count() %>%
group_by(!!sym(taxon), Group) %>%
summarise(n = sum(n), .groups = "drop") %>%
pivot_wider(names_from = Group, values_from = n) %>%
column_to_rownames(taxon) %>%
as.matrix()
data2 <- asv_tab %>%
group_by(!!sym(taxon), Group,Country) %>%
count()
data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
}Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)] Bilophila Enterococcus
0.03417286 0.03417286
df <- data.frame(SeqID=uni_df$SeqID,
p=p_values,
p_adj=p_values_adjusted)prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))3.4.1.2 rPSC vs healthy
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 98 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 248 samples and 178 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.2.1.1 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.2.1.2 Group - Intersection
intersection_results <- group_intersection(group,
list_intersections,
list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.2.1.3 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.2.2 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.2.3 non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.2.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,group,
usage="linDA")Removing 39 ASV(s)
Removing 6 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 424 samples and 159 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.2.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.2.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.2.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.2.4 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.2.5 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
colon_taxa_tab) + xlab("") + ylab("")min_clr -2.080021
max_clr 6.901023
min_log -4.181218
max_log 3.872081
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
pdot_heatmap_colon <- p3.4.1.3 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)3.5 Machine learning
path = "../results/Q2/models"3.5.1 ElasticNet
model="enet"3.5.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]3.5.1.1.1 rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.03550553
auc 0.99707181
auc_czech 0.99495342
auc_no 1.00000000
auc_optimism_corrected 0.62839790
auc_optimism_corrected_CIL 0.49485088
auc_optimism_corrected_CIU 0.76009870
accuracy 0.98571429
accuracy_czech NaN
accuracy_no 0.98581560
accuracy_optimism_corrected 0.70988684
accuracy_optimism_corrected_CIL 0.61932014
accuracy_optimism_corrected_CIU 0.80479243
enet_model$conf_matrices$original
Predicted
True 0 1
0 262 1
1 4 83
$czech
Predicted
True 0 1
0 160 1
1 2 46
$no
Predicted
True 0 1
0 102 0
1 2 37
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.1.2 rPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
taxa_to_test,
usage="ml_clr",patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,clust_var = "Patient",
file=model_name,
Q="Q2")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.80000000
lambda 0.03321672
auc 0.83344259
auc_czech 0.78312629
auc_no 0.92383107
auc_optimism_corrected 0.65974941
auc_optimism_corrected_CIL 0.49897616
auc_optimism_corrected_CIU 0.79007241
accuracy 0.79714286
accuracy_czech NaN
accuracy_no 0.82269504
accuracy_optimism_corrected 0.73877807
accuracy_optimism_corrected_CIL 0.63177895
accuracy_optimism_corrected_CIU 0.82893728
enet_model$conf_matrices$original
Predicted
True 0 1
0 258 5
1 66 21
$czech
Predicted
True 0 1
0 156 5
1 41 7
$no
Predicted
True 0 1
0 102 0
1 25 14
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.2 Saving results
models_summ_df_colon <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))3.5.2 Supplementary models
3.5.2.1 CLR-transformed data
3.5.2.1.1 kNN
model="knn"3.5.2.1.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 12.0000000
auc 0.9046370
auc_optimism_corrected 0.6077603
auc_optimism_corrected_CIL 0.4624559
auc_optimism_corrected_CIU 0.7378034
accuracy 0.7771429
accuracy_optimism_corrected 0.7330057
accuracy_optimism_corrected_CIL 0.6357187
accuracy_optimism_corrected_CIU 0.8225315
roc_crPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
taxa_to_test,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2",clust_var="Patient")
# ROC
roc_c <- roc_curve(knn_model, group)
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs
# see the results
knn_model$model_summary %>% t() [,1]
k 11.0000000
auc 0.8750273
auc_optimism_corrected 0.6100785
auc_optimism_corrected_CIL 0.4799872
auc_optimism_corrected_CIU 0.7402661
accuracy 0.7828571
accuracy_optimism_corrected 0.7192504
accuracy_optimism_corrected_CIL 0.6296117
accuracy_optimism_corrected_CIU 0.8048780
roc_c3.5.2.1.2 Random Forest
model="rf"3.5.2.1.2.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "61"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.647396"
auc_optimism_corrected_CIL "0.5281583"
auc_optimism_corrected_CIU "0.7763141"
accuracy "1"
accuracy_optimism_corrected "0.7373451"
accuracy_optimism_corrected_CIL "0.6285415"
accuracy_optimism_corrected_CIU "0.8257258"
roc_crPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
taxa_to_test,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2",
clust_var="Patient")
# ROC curve
roc_c <- roc_curve(rf_model, group)
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs
# see the results
rf_model$model_summary %>% t() [,1]
mtry "13"
splitrule "gini"
min.node.size "2"
auc "1"
auc_optimism_corrected "0.6442126"
auc_optimism_corrected_CIL "0.5114036"
auc_optimism_corrected_CIU "0.7678272"
accuracy "1"
accuracy_optimism_corrected "0.7373046"
accuracy_optimism_corrected_CIL "0.6338531"
accuracy_optimism_corrected_CIU "0.8252843"
roc_c3.5.2.1.3 Gradient boosting
model="gb"3.5.2.1.3.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q2")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 10.0000000
auc 1.0000000
auc_optimism_corrected 0.6573363
auc_optimism_corrected_CIL 0.5326281
auc_optimism_corrected_CIU 0.7805540
accuracy 1.0000000
accuracy_optimism_corrected 0.7392522
accuracy_optimism_corrected_CIL 0.6438517
accuracy_optimism_corrected_CIU 0.8320565
roc_c**rPSC vs non-rPSC - subset*
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")taxa_to_test <- read.xlsx("../results/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")
model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
taxa_to_test,
usage="ml_clr",
patient = TRUE)Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q2",
clust_var = "Patient")
# ROC curve
roc_c <- roc_curve(gbm_model, group)
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs
# see the results
gbm_model$model_summary %>% t() [,1]
n.trees 500.0000000
interaction.depth 3.0000000
shrinkage 0.1000000
n.minobsinnode 20.0000000
auc 1.0000000
auc_optimism_corrected 0.6539837
auc_optimism_corrected_CIL 0.5040926
auc_optimism_corrected_CIU 0.7768911
accuracy 1.0000000
accuracy_optimism_corrected 0.7271262
accuracy_optimism_corrected_CIL 0.6299363
accuracy_optimism_corrected_CIU 0.8130202
roc_c3.5.2.2 Saving results
models_list <- list()
for (model_name in names(supplements_models$models_summ)){
df <- do.call(rbind, supplements_models$models_summ[[model_name]])
models_list[[model_name]] <- df
}
write.xlsx(models_list,
file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
rowNames=TRUE)3.6 Results overview
3.6.0.1 Alpha diversity
Linear models
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -17.252 | 12.469 | 156.883 | -1.384 | 0.168 | 0.192 | |
| non-rPSC vs rPSC - CZ vs NO | -17.941 | 10.644 | 153.303 | -1.686 | 0.094 | 0.192 | |
| non-rPSC vs GrouprPSC:CountryNO | 3.483 | 21.371 | 152.108 | 0.163 | 0.871 | 0.871 | |
| healthy vs GrouprPSC | -40.307 | 11.924 | 125.516 | -3.380 | 0.001 | 0.009 | ** |
| healthy vs rPSC - CZ vs NO | 14.783 | 10.544 | 126.613 | 1.402 | 0.163 | 0.192 | |
| healthy vs GrouprPSC:CountryNO | -29.232 | 19.769 | 123.028 | -1.479 | 0.142 | 0.192 | |
| healthy vs Groupnon-rPSC | -22.976 | 8.905 | 211.354 | -2.580 | 0.011 | 0.047 | * |
| healthy vs non-rPSC - CZ vs NO | 14.961 | 10.892 | 212.956 | 1.374 | 0.171 | 0.192 | |
| healthy vs Groupnon-rPSC:CountryNO | -32.911 | 14.701 | 208.353 | -2.239 | 0.026 | 0.079 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.146 | 0.161 | 154.485 | -0.907 | 0.366 | 0.549 | |
| non-rPSC vs rPSC - CZ vs NO | -0.355 | 0.138 | 151.788 | -2.581 | 0.011 | 0.051 | |
| non-rPSC vs GrouprPSC:CountryNO | 0.008 | 0.277 | 150.856 | 0.030 | 0.976 | 0.976 | |
| healthy vs GrouprPSC | -0.330 | 0.128 | 126.201 | -2.572 | 0.011 | 0.051 | |
| healthy vs rPSC - CZ vs NO | 0.017 | 0.113 | 127.217 | 0.146 | 0.884 | 0.976 | |
| healthy vs GrouprPSC:CountryNO | -0.364 | 0.213 | 123.907 | -1.709 | 0.090 | 0.162 | |
| healthy vs Groupnon-rPSC | -0.182 | 0.101 | 207.051 | -1.795 | 0.074 | 0.162 | |
| healthy vs non-rPSC - CZ vs NO | 0.018 | 0.124 | 208.617 | 0.145 | 0.884 | 0.976 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.375 | 0.168 | 204.129 | -2.240 | 0.026 | 0.078 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.005 | 0.029 | 153.608 | -0.169 | 0.866 | 0.866 | |
| non-rPSC vs rPSC - CZ vs NO | -0.032 | 0.024 | 151.251 | -1.296 | 0.197 | 0.443 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.027 | 0.049 | 150.427 | -0.554 | 0.581 | 0.806 | |
| healthy vs GrouprPSC | -0.031 | 0.020 | 127.660 | -1.523 | 0.130 | 0.391 | |
| healthy vs rPSC - CZ vs NO | -0.007 | 0.018 | 128.411 | -0.402 | 0.688 | 0.806 | |
| healthy vs GrouprPSC:CountryNO | -0.052 | 0.034 | 125.982 | -1.540 | 0.126 | 0.391 | |
| healthy vs Groupnon-rPSC | -0.026 | 0.016 | 201.924 | -1.600 | 0.111 | 0.391 | |
| healthy vs non-rPSC - CZ vs NO | -0.007 | 0.020 | 203.469 | -0.363 | 0.717 | 0.806 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.025 | 0.026 | 199.046 | -0.952 | 0.342 | 0.616 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| non-rPSC vs GrouprPSC | -0.007 | 0.026 | 153.863 | -0.292 | 0.771 | 0.843 | |
| non-rPSC vs rPSC - CZ vs NO | -0.052 | 0.022 | 150.735 | -2.372 | 0.019 | 0.171 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.009 | 0.044 | 149.674 | -0.198 | 0.843 | 0.843 | |
| healthy vs GrouprPSC | -0.023 | 0.020 | 127.665 | -1.183 | 0.239 | 0.538 | |
| healthy vs rPSC - CZ vs NO | -0.008 | 0.017 | 129.013 | -0.457 | 0.648 | 0.843 | |
| healthy vs GrouprPSC:CountryNO | -0.052 | 0.033 | 124.565 | -1.605 | 0.111 | 0.333 | |
| healthy vs Groupnon-rPSC | -0.015 | 0.015 | 204.651 | -1.006 | 0.316 | 0.568 | |
| healthy vs non-rPSC - CZ vs NO | -0.008 | 0.019 | 206.473 | -0.416 | 0.678 | 0.843 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.044 | 0.025 | 201.171 | -1.745 | 0.083 | 0.333 |
Plots
alpha_div_plots[[paste(segment,"Country")]]3.6.0.2 Beta diversity
Main results
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| rPSC vs non-rPSC | 1 | 419.973 | 2.475 | 0.007 | 0.415 | 0.415 | |
| rPSC vs healthy | 1 | 1473.966 | 8.988 | 0.035 | 0.001 | 0.002 | ** |
| non-rPSC vs healthy | 1 | 1742.331 | 10.401 | 0.024 | 0.001 | 0.002 | ** |
| rPSC vs non-rPSC , Country | 1 | 1556.933 | 9.177 | 0.026 | 0.001 | 0.001 | *** |
| rPSC vs healthy , Country | 1 | 883.284 | 5.386 | 0.021 | 0.001 | 0.001 | *** |
| non-rPSC vs healthy , Country | 1 | 1558.672 | 9.304 | 0.021 | 0.001 | 0.001 | *** |
| rPSC vs non-rPSC : Country | 1 | 318.175 | 1.880 | 0.005 | 0.970 | 0.970 | |
| rPSC vs healthy : Country | 1 | 319.594 | 1.956 | 0.008 | 0.820 | 0.970 | |
| non-rPSC vs healthy : Country | 1 | 407.191 | 2.439 | 0.006 | 0.163 | 0.489 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]Supplements
knitr::kable(
supplements_beta[!grepl("PCoA",names(supplements_beta)) &
(grepl("genus",names(supplements_beta))) &
(grepl(segment,names(supplements_beta)))],
digits = 3,
caption = "Supplementary PERMANOVA results: Bray-curtis, Jaccard distances")
|
|
Supplementary PCA
plot_list <- supplements_beta[grepl("PCoA",names(supplements_beta)) &
grepl(segment,names(supplements_beta))]
ggarrange(plotlist = plot_list,
labels=names(plot_list),
font.label = list(size=5,face="plain"),
ncol=2,nrow=3)3.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(as.data.frame(lapply(list_intersections,nrow)) %>% t() %>%
`colnames<-`("Count") )| Count | |
|---|---|
| terminal_ileum.genus.non.rPSC.vs.rPSC | 0 |
| terminal_ileum.genus.healthy.vs.rPSC | 34 |
| terminal_ileum.genus.healthy.vs.non.rPSC | 25 |
| colon.genus.non.rPSC.vs.rPSC | 0 |
| colon.genus.healthy.vs.rPSC | 56 |
| colon.genus.healthy.vs.non.rPSC | 30 |
3.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=3,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| rPSC vs non-rPSC genus colon | 0.0 | 0.036 | 0.628 | 0.495 | 0.76 |
| rPSC vs non-rPSC subset genus colon | 0.8 | 0.033 | 0.660 | 0.499 | 0.79 |
ROC - Genus level
roc_curve_all_custom(roc_cs[3:4],Q="Q2",
model_name="enet_model",legend=FALSE)Supplementary models
# Build final dataframe
models_list[["enet_model"]] <- rbind(models_summ_df_ileum,models_summ_df_colon)
final_df <- tibble(row_names = rownames(models_list[[1]]))
# Loop through models and extract required values
for (model_name in names(models_list)) {
model_df <- models_list[[model_name]]
# Combine AUC_optimism_corrected with its CI values
final_df[[model_name]] <- paste0(
round(model_df$auc_optimism_corrected, 3),
" (", round(model_df$auc_optimism_corrected_CIL, 3), "; ",
round(model_df$auc_optimism_corrected_CIU, 3), ")"
)
}
knitr::kable(final_df, caption="All models")| row_names | knn_model | rf_model | gbm_model | enet_model |
|---|---|---|---|---|
| rPSC vs non-rPSC genus terminal_ileum | 0.518 (0.321; 0.685) | 0.576 (0.383; 0.756) | 0.548 (0.391; 0.699) | 0.574 (0.391; 0.732) |
| rPSC vs non-rPSC subset genus terminal_ileum | 0.581 (0.412; 0.738) | 0.634 (0.472; 0.781) | 0.613 (0.435; 0.764) | 0.56 (0.386; 0.723) |
| rPSC vs non-rPSC genus colon | 0.608 (0.462; 0.738) | 0.647 (0.528; 0.776) | 0.657 (0.533; 0.781) | 0.628 (0.495; 0.76) |
| rPSC vs non-rPSC subset genus colon | 0.61 (0.48; 0.74) | 0.644 (0.511; 0.768) | 0.654 (0.504; 0.777) | 0.66 (0.499; 0.79) |
write.csv(final_df,file=file.path(path,"AUC_all_models.csv"),row.names = FALSE)Supplementary ROC - genus
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs
plot_list <- list()
for (model_name in names(rocs_list)) {
plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(2:3)],
Q="Q2",
model_name=model_name,legend=FALSE)
}
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
p4 Paper-ready visualizations
4.1 Alpha diversity
p_A <- alpha_div_plots$`terminal_ileum Country` +
ggtitle("Terminal ileum")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_B <- alpha_div_plots$`colon Country` +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
Q2_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,
widths = c(1,0.1,1))
Q2_alpha4.2 Beta diversity
pca_ti <- pca_plots_list$`terminal_ileum genus custom`
pca_colon <- pca_plots_list$`colon genus custom`
genus_Q2_beta <- ggarrange(pca_ti,
ggplot() + theme_minimal(),
pca_colon,ncol=3,
widths = c(1,0.1,1))
genus_Q2_betaAlpha + Beta diversity
alpha_beta <- ggarrange(Q2_alpha,genus_Q2_beta,
ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta4.3 Elastic net
models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
names(models_to_plot) <- c("kNN","RF","GBoost","ENet")
models_to_plot <- c("enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENet")
# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
plot_list_ileum[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(1:2)],
Q="Q2",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)
# COLON
plot_list_colon <- list()
for (model_name in models_to_plot) {
plot_list_colon[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(3:4)],
Q="Q2",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_colon <- ggarrange(plotlist = plot_list_colon)
roc_curve_plot <- ggarrange(roc_curve_ti,
ggplot() + theme_minimal(),
roc_curve_colon,
ncol=3, widths = c(1,0.1,1))
roc_curve_plot4.4 Figure 4
alpha_beta_elastic <- ggarrange(Q2_alpha,genus_Q2_beta,roc_curve_plot,
ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elasticalpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
widths = c(1,0.15))
alpha_beta_elastic4.5 Supplementary Figure - DAA
p_ileum <- dot_heatmap_ileum +
ggtitle("Terminal ileum") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
p_colon <- dot_heatmap_colon +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
heatmap_plot <- ggarrange(p_ileum,
p_colon,
ncol = 2,labels=c("A","B"))
heatmap_plot5 Session info
sessionInfo()R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=Czech_Czechia.utf8 LC_CTYPE=Czech_Czechia.utf8
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C
[5] LC_TIME=Czech_Czechia.utf8
time zone: Europe/Prague
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] mice_3.16.0 picante_1.8.2 ape_5.8
[4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[7] tidyverse_2.0.0 kableExtra_1.4.0 tidyr_1.3.1
[10] gbm_2.2.2 doParallel_1.0.17 iterators_1.0.14
[13] foreach_1.5.2 ranger_0.17.0 ggvenn_0.1.15
[16] Maaslin2_1.16.0 purrr_1.0.2 pROC_1.18.5
[19] glmnet_4.1-8 MicrobiomeStat_1.2 caret_6.0-94
[22] openxlsx_4.2.6.1 magrittr_2.0.3 emmeans_1.10.7
[25] lmerTest_3.1-3 robustlmm_3.3-1 lme4_1.1-35.5
[28] Matrix_1.6-5 mgcv_1.9-1 nlme_3.1-164
[31] pheatmap_1.0.12 reshape2_1.4.4 vegan_2.6-4
[34] lattice_0.22-6 permute_0.9-7 ggplotify_0.1.2
[37] ggrepel_0.9.5 ggpubr_0.6.0 MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0 ggplot2_3.5.1 tibble_3.2.1
[43] dplyr_1.1.4 cowplot_1.1.3 readr_2.1.5
[46] igraph_2.0.3 ccrepe_1.38.1 data.table_1.15.4
loaded via a namespace (and not attached):
[1] fs_1.6.4 matrixStats_1.3.0
[3] bitops_1.0-7 RColorBrewer_1.1-3
[5] numDeriv_2016.8-1.1 tools_4.3.1
[7] backports_1.4.1 R6_2.6.1
[9] lazyeval_0.2.2 jomo_2.7-6
[11] rhdf5filters_1.14.1 withr_3.0.2
[13] gridExtra_2.3 cli_3.6.2
[15] Biobase_2.62.0 logging_0.10-108
[17] biglm_0.9-3 sandwich_3.1-1
[19] labeling_0.4.3 mvtnorm_1.2-4
[21] robustbase_0.99-3 pbapply_1.7-2
[23] systemfonts_1.1.0 yulab.utils_0.2.0
[25] svglite_2.1.3 parallelly_1.38.0
[27] rstudioapi_0.17.1 generics_0.1.3
[29] gridGraphics_0.5-1 shape_1.4.6.1
[31] car_3.1-3 zip_2.3.1
[33] biomformat_1.30.0 S4Vectors_0.40.2
[35] abind_1.4-8 infotheo_1.2.0.1
[37] lifecycle_1.0.4 multcomp_1.4-28
[39] yaml_2.3.8 carData_3.0-5
[41] SummarizedExperiment_1.32.0 Rtsne_0.17
[43] rhdf5_2.46.1 recipes_1.1.1
[45] SparseArray_1.2.4 grid_4.3.1
[47] mitml_0.4-5 crayon_1.5.3
[49] pillar_1.10.1 knitr_1.50
[51] optparse_1.7.5 GenomicRanges_1.54.1
[53] statip_0.2.3 boot_1.3-31
[55] estimability_1.5.1 future.apply_1.11.3
[57] codetools_0.2-20 pan_1.9
[59] glue_1.7.0 ggfun_0.1.8
[61] vctrs_0.6.5 treeio_1.26.0
[63] gtable_0.3.6 maditr_0.8.4
[65] gower_1.0.1 xfun_0.51
[67] S4Arrays_1.2.1 prodlim_2024.06.25
[69] libcoin_1.0-10 coda_0.19-4.1
[71] pcaPP_2.0-4-1 modeest_2.4.0
[73] survival_3.5-8 timeDate_4041.110
[75] hardhat_1.4.1 lava_1.8.1
[77] statmod_1.5.0 TH.data_1.1-3
[79] ipred_0.9-15 ggtree_3.10.1
[81] GenomeInfoDb_1.38.8 fBasics_4032.96
[83] rpart_4.1.23 colorspace_2.1-0
[85] BiocGenerics_0.48.1 DBI_1.2.3
[87] nnet_7.3-19 ade4_1.7-22
[89] tidyselect_1.2.1 timeSeries_4041.111
[91] compiler_4.3.1 microbiome_1.24.0
[93] xml2_1.3.6 DelayedArray_0.28.0
[95] scales_1.3.0 DEoptimR_1.1-3-1
[97] spatial_7.3-17 digest_0.6.35
[99] minqa_1.2.8 rmarkdown_2.29
[101] XVector_0.42.0 htmltools_0.5.8.1
[103] pkgconfig_2.0.3 MatrixGenerics_1.14.0
[105] stabledist_0.7-2 fastmap_1.2.0
[107] rlang_1.1.3 htmlwidgets_1.6.4
[109] farver_2.1.2 zoo_1.8-12
[111] jsonlite_1.8.8 ModelMetrics_1.2.2.2
[113] RCurl_1.98-1.14 modeltools_0.2-23
[115] Formula_1.2-5 GenomeInfoDbData_1.2.11
[117] patchwork_1.3.0 Rhdf5lib_1.24.2
[119] munsell_0.5.1 Rcpp_1.0.12
[121] ggnewscale_0.5.1 fastGHQuad_1.0.1
[123] stringi_1.8.3 ggstar_1.0.4
[125] stable_1.1.6 zlibbioc_1.48.2
[127] MASS_7.3-60.0.1 plyr_1.8.9
[129] listenv_0.9.1 Biostrings_2.70.3
[131] splines_4.3.1 hash_2.2.6.3
[133] multtest_2.58.0 hms_1.1.3
[135] ggtreeExtra_1.12.0 ggsignif_0.6.4
[137] stats4_4.3.1 rmutil_1.1.10
[139] evaluate_1.0.3 nloptr_2.1.1
[141] tzdb_0.4.0 getopt_1.20.4
[143] future_1.34.0 clue_0.3-65
[145] coin_1.4-3 broom_1.0.7
[147] xtable_1.8-4 tidytree_0.4.6
[149] rstatix_0.7.2 viridisLite_0.4.2
[151] class_7.3-22 aplot_0.2.5
[153] IRanges_2.36.0 cluster_2.1.6
[155] timechange_0.3.0 globals_0.16.3